Development of Machine Learning Algorithms Incorporating Electronic Health Record Data, Patient-Reported Outcomes, or Both to Predict Mortality for Outpatients With Cancer

PURPOSE Machine learning (ML) algorithms that incorporate routinely collected patient-reported outcomes (PROs) alongside electronic health record (EHR) variables may improve prediction of short-term mortality and facilitate earlier supportive and palliative care for patients with cancer. METHODS We trained and validated two-phase ML algorithms that incorporated standard PRO assessments alongside approximately 200 routinely collected EHR variables, among patients with medical oncology encounters at a tertiary academic oncology and a community oncology practice. RESULTS Among 12,350 patients, 5,870 (47.5%) completed PRO assessments. Compared with EHR- and PRO-only algorithms, the EHR + PRO model improved predictive performance in both tertiary oncology (EHR + PRO v EHR v PRO: area under the curve [AUC] 0.86 [0.85-0.87] v 0.82 [0.81-0.83] v 0.74 [0.74-0.74]) and community oncology (area under the curve 0.89 [0.88-0.90] v 0.86 [0.85-0.88] v 0.77 [0.76-0.79]) practices. CONCLUSION Routinely collected PROs contain added prognostic information not captured by an EHR-based ML mortality risk algorithm. Augmenting an EHR-based algorithm with PROs resulted in a more accurate and clinically relevant model, which can facilitate earlier and targeted supportive care for patients with cancer.


INTRODUCTION
For patients with cancer, early supportive care interventions, including serious illness conversations and palliative care, are evidence-based practices that improve quality of life and goal-concordant care. [1][2][3][4] However, timely identification of patients who may benefit from early supportive care is challenging: Oncology clinicians are often unable to identify patients at risk of 6-month mortality and overestimate life expectancy for up to 70% of their patients. [5][6][7][8] Interventions on the basis of electronic health record (EHR)-based machine learning (ML) prognostic algorithms increase serious illness conversations and palliative care referral and could lead to more goalconcordant cancer care for patients with cancer. [9][10][11][12][13][14][15][16][17][18][19] However, such ML algorithms usually rely on structured EHR data, including laboratories, demographics, and diagnosis codes, which provide limited insight into patient symptoms or functional status. 20 Patient-reported outcomes (PROs), which are independently associated with mortality, 21 may augment such ML algorithms. Routine PRO assessment is now more common and may allow oncology clinicians to better identify patients with high symptom burden or declining functional status. [22][23][24][25][26] However, the role of PROs in risk stratification remains unexplored. Incorporating PROs may improve traditional risk stratification tools used for supportive and end-of-life care planning.
In this study, we trained and compared algorithms on the basis of EHR data alone, PRO data alone, and EHR plus PRO data, to estimate 6-month risk of mortality among patients seen in either a large tertiary academic practice, or a community-based general oncology clinic. We hypothesized that adverse PROs would be independently associated with 6-month mortality, and that integrating routinely collected PROs into EHRbased ML algorithms would improve predictive performance compared with ML algorithms on the basis of EHR or PRO data alone in both oncology settings.

Data Source
We derived our cohort from patients receiving care at the University of Pennsylvania Health System (UPHS) who were listed in Clarity, an EPIC reporting database, which contains individual electronic medical records for patients containing demographic, comorbidity, and laboratory data. Our study followed the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD; Data Supplement) checklist for prediction model development and validation. 27 We obtained approval and waiver of informed consent from the University of Pennsylvania institutional review board before conducting this study.

Study Population
Our cohort consisted of patients age 18 years or older who had outpatient medical oncology encounters at the Perelman School of Advanced Medicine (PCAM), a large tertiary practice with disease-specific oncology clinics, and Pennsylvania Hospital (PAH), a community oncology practice, between July 1, 2019, and January 1, 2020. We chose patients in these medical oncology clinics because (1) there has been routine collection of PROs in these clinics since mid-2019, (2) an EHR-based ML algorithm has been prospectively validated and implemented in these clinics as part of an initiative to increase serious illness conversations, 28,29 and (3) a tertiary academic practice and a community oncology site are representative of the majority of oncology care settings. Details of PRO collection can be found in the Data Supplement. Patients were not required to have received cancer-directed treatment to be included in this study. We excluded patients who had benign hematology or genetics encounters, , 2 encounters during the study period, or no laboratory or comorbidity data within 6 months of the encounter. Our final cohort consisted of 12,350 patients (8,555 at PCAM and 3,795 at PAH); Appendix Figure A1 represents our cohort selection. In all statistical analysis and modeling, we used the first hematology/oncology encounter in the study period for each patient as the index encounter for statistical modeling. We chose not to incorporate PRO data from subsequent encounters because we found that trends in PROs were not meaningfully associated with mortality (Appendix Fig A2).

Features
Features included EHR and PRO data. Our EHR data set included three broad classes of features: (1) demographic variables, (2) comorbidities 30 ; and (3) laboratory data. Our final feature set consisted of approximately 200 variables from the EHR (Appendix Table A1). PRO features were derived from the PRO version of The Common Terminology Criteria for Adverse Events (PRO-CTCAE) 31 and the Patient-Reported Outcomes Measurement Information System (PROMIS) Global v.1.2 32 scales (Appendix Table A2). Further details on features are available in the Data Supplement.

Outcome
The primary outcome was death within 180 days of the index encounter at an oncology practice. We chose 180day mortality because it is a common indicator of short-term mortality and is often used as a criterion for hospice referral. 16 Date of death was derived from the first date of death recorded in either the EHR (Clarity database) or the Social Security Administration Death Master File, matched to UPHS patients by social security number and date of birth. 33

Training and Validation Set Split
In the PCAM cohort, the study population was randomly split into a training cohort (70%), in which the mortality risk algorithms were derived, and a validation cohort (30%), in which the algorithms were applied and tested. Patients could not appear in both the training and validation sets. In the PAH cohort, splitting the data set into a training and testing set was not feasible because of the much lower number of cases.

Algorithm Development
To develop an algorithm on the basis of EHR variables alone (EHR algorithm), we fitted a logistic regression model with the adaptive LASSO algorithm to ensure consistent variable selection. 34 To develop an algorithm on the basis of PROs alone (PRO algorithm), we fit a logistic regression model where all of the PROs are included as covariates, with observed 180-day mortality as the outcome. To develop an algorithm that includes both EHR and PRO variables (EHR + PRO algorithm), we applied a two-phase method to CONTEXT Key Objectives To train and compare algorithms on the basis of electronic health record (EHR) data alone, patient-reported outcome (PRO) data alone, and EHR plus PRO data, to estimate 6-month risk of mortality among patients seen in routine oncology practice.

Statistical Analysis
We used descriptive statistics to compare the characteristics of the study population, stratified by whether PROs were collected. All algorithm analyses were performed separately for the PCAM and PAH cohorts using Rstudio software. We first explored correlations among individual PRO features using the aggregated PCAM and PAH data. We then fit logistic regression models with 180-day mortality as the outcome and each PRO as the only covariate.
We also fit two-variable logistic regression models that assessed the association between each PRO and mortality, adjusted for the continuous 180-day mortality risk from the EHR algorithm. These exploratory analyses informed independent associations between PROs and mortality and the potential of PROs to augment ML performance.
Then the performance of the three different algorithms (EHR, PRO, and EHR + PRO) was assessed by calculating AUC and AUPRC, our primary performance metrics. Truepositive rate (TPR) and false-positive rate at a previously specified 10% risk threshold 29 were secondary performance metrics. The 95% CIs for each performance metric were derived using bootstrapping, where each of the two cohorts was repeatedly sampled with replacement to generate 1,000 data sets of the same size. Performance for the EHR model was evaluated for all individuals in the testing set for PCAM (n = 2,566) and in the entire cohort for PAH (n = 3,795). Performance of the PRO and EHR + PRO models were evaluated only for those who had complete PRO data (n = 1,387 for PCAM and n = 1,193 for PAH). Because estimation of the performance metrics for the EHR + PRO algorithm corrected for the potential nonrepresentativeness of the subset of individuals with complete PRO data, the EHR + PRO results are therefore representative of the complete test set. As a sensitivity analysis, we obtained predictive accuracy metrics for all models from the test set for PCAM using only the subset of individuals with PRO data available (n = 1,387). Finally, we conducted a decision curve analysis (see the Data Supplement for methodology) to assess the clinical impact of using each model to identify high-risk patients for the purpose of directing earlier supportive care. 38-40

Cohort Description
The study cohort consisted of 8,555 patients who had 50,590 encounters from the tertiary oncology practice and 3,795 patients who had 32,805 encounters from the community oncology practice (median encounters per patient 4, interquartile range 2-7 during study period  Table 1).

Correlation Among PRO Variables
In the combined tertiary and community oncology practice cohorts, decreased performance status was strongly correlated with fatigue (r = 0.69), decreased appetite (r = 0.5), and poorer quality of life (r = 0.58); fatigue was also strongly correlated with poorer quality of life (r = 0.6) and decreased appetite (r = 0.51; Fig 1). Increased anxiety was strongly correlated with increased sadness (r = 0.72). The correlation for all other PRO variable pairs was weak or moderate (r , 0.5). These results were consistent in practice-specific subset analyses (Appendix Figs A3 and A4).

PRO Associations With Observed Mortality
In  (Fig 2). After adjusting for EHR mortality risk, associations between adverse PROs and observed mortality remained significant for performance status, quality of life, fatigue, shortness of breath, anxiety, sadness, constipation, decreased appetite, and nausea (range of adjusted ORs 1.18-1.53; Appendix Table A3). We observed a similar pattern with community oncology practice data, although fewer associations were statistically significant (Appendix Table A4). Adverse PROs were also associated with higher EHR mortality risk for all PROs except rash.

Algorithm Performance
The final EHR + PRO model included the logit of the predicted probabilities from the EHR model, performance status, quality of life, numbness and tingling, and nausea (Appendix Fig A5). For the tertiary oncology practice data, the AUC of the EHR + PRO algorithm (0.86; 95% CI, 0.85 to 0.87) was significantly higher than that of the EHR (0.82; 95% CI, 0.81 to 0.83) and PRO (0.74; 95% CI, 0.73 to 0.75) algorithms ( Fig 3A). 95% CI, 0.10 to 0.12) algorithms ( Fig 3D). The results were similar in the community oncology practice cohort (Figs 3A-D). In the sensitivity analysis among patients with only PRO data, the EHR + PRO model had consistently higher performance than the PRO model (Appendix Table A5).

Decision Curve Analysis
In both the tertiary and community oncology practice data sets, the decision curve for the EHR + PRO algorithm dominated the decision curves for the EHR and PRO algorithms, indicating that the EHR + PRO algorithm achieves greater clinical utility than the EHR and PRO algorithms regardless of risk preferences (Fig 4). Although routine PRO collection is recommended by consensus guidelines for clinical symptom management and toxicity monitoring during clinical trials, 24,46 use of routinely collected PROs as part of risk stratification, including prognostic risk stratification, is rare in practice. Prior retrospective studies have found that adverse quality of life and symptoms such as depression, fatigue, and pain are independently associated with poorer survival. 21,47,48 However, few studies have demonstrated the independent prognostic value of PROs in contemporary machine learning algorithms. Our study suggests that PROs are only modestly correlated with EHR-predicted mortality risk, and there is likely additional independent prognostic value of PROs that would be of benefit in ML algorithms. Although natural language processing for clinician notes is another potential option to elicit symptoms, there is significant discordance between actual patient-reported symptoms and clinicians' documentation in the EHR. 49,50 Relying on routinely collected PROs is likely a better way to capture symptoms to maximally improve performance of predictive algorithms.
A strength of our two-phase methodology is its flexible approach, using PRO data when available and EHR data for all patients. This differs from traditional complete-case analyses, which may not use representative populations, or imputation-based approaches, which would perform poorly in a setting with a high missingness of PROs. Other advantages of this two-phase methodology are detailed in the Data Supplement. There are several potential limitations to this study. First, although we trained EHR + PRO algorithms across academic and community oncology cohorts, validation across other institutions, including those with greater Hispanic representation, would be valuable. However, the EHR features used in our models are all commonly available in structured data fields in all health system EHRs, and the PRO features we used were based on standard instruments using Likert scale values. We did not derive any features from unstructured data, and thus, we would not expect semantic differences in coding between different systems. Nevertheless, our approach should be externally validated as other issues of data quality, including completeness of features and heterogeneity in coding practices between institutions, are well known and should be accounted for. 51,52 Second, we did not validate a specific model in an external institutional cohort, but rather used a two-phase approach to test similar algorithms in two unique practices. This approach is justified because the purpose of our study was not to validate a specific algorithm, but rather to validate the conclusion that integrating PROs into routine prognostic algorithms improves risk prediction. Third, there is no gold standard reference for mortality prediction, and it is unclear how our EHR + PRO model compares with other published mortality prediction tools. However, we used the same features used in a validated EHR algorithm that is in routine use in medical oncology practices within our cancer center to prompt serious illness conversations. 28 Fourth, although we expect that our institutional registry and Social Security death data captured most deaths, we were unable to use more robust death data including National Death Index and obituary data. Fifth, although algorithm performance in our PCAM sample is reported on a typical holdout test set, we were unable to use a train/test split using the PAH data because of the much smaller number of cases in that data set.
In conclusion, among 12,350 patients with cancer seen in tertiary and community oncology practices, ML algorithms to predict short-term mortality that integrated routinely collected patient-reported outcomes with electronic health record features significantly improved predictive performance, compared with algorithms on the basis of EHR or PRO data alone. Our findings suggest that PROs can significantly improve performance of predictive algorithms in oncology.

DISCLAIMER
The funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

DATA SHARING STATEMENT
The data that support the findings of this study are available on request. All statistical analysis was performed in R version 3.6.0. The validation for the EHR algorithm has been previous published (https://jamanetwork. com/journals/jamaoncology/article-abstract/2770698), and source code is available at https://github.com/pennsignals/eol-onc.        Abbreviations: PRO, patient-reported outcome; Q, quarter; SD, standard deviation. a Linear model, where outcome is the categories of distribution of mortality risk (quarter 1-quarter 4), and predictor is the mean of PRO in each quarter.